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Abstract. 

Based on a model first studied in [Li, 1991a], properties of correlation function for 
expansion-modification systems are developed. The existence of several characteristic 
exponents is proved. The relationship of this fact with long-range correlation in DNA is 
stablished. Comparison between theoretical exponents and those obtained from simulation 
and real sequences are also showed. 
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1 Introduction. 



The discovery of DNA molecule has revolutionised our way of thinking about biological 
evolution [10]. One of the most important challenges in our time is the understanding of its 
functioning, because, notwithstanding the huge amount of sequenced base pairs, the rate of 
interpretation of this data is lagging behind the rate of acquisition. 
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One of the most fruitful lines of research in recent years is related to long-range correlation 
in DNA [Borsnitk, 1993], [Buldyrev, 1995], [Burks & Farmer, 1984], [Chatzidimitrou- 
Dreismann & Larhamar, 1993], [Karlin & Brendel, 1993], [Larhamar & Chatzidimitrou- 
Dresimann, 1993], [Li, 1987], [Li, 1989], [Li, 1992], [Li & Kaneko, 1992], [Li & Kaneko, 
1992a], [Mansilla & Mateo-Reig, 1995], [Miramontes, 1992], [Nee, 1992], [Peng et el., 
1992], [Peng et al., 1993], [Prabhu & Claverie, 1992], [Voss, 1992]. Most of these papers 
report experimental evidences on long-range correlation [Borsnitk, 1993], [Buldyrev, 
1995], [Chatzidimitrou-Dreismann & Larhamar, 1993], [Karlin & Brendel, 1993], [Li, 
1992], [Li & Kaneko, 1992], [Peng et el., 1992], [Prabhu & Claverie, 1992] and just a few 
develop theoretical models addressed to explaining the above mentioned property [Li, 
1991], [Li, 1992], [Mansilla & Mateo-Reig, 1995]. In [Li, 1989], a model is exposed 
further studied in [Li, 1991a], which grasps the main features in the formation of long-range 
correlation in DNA: point mutation and insertion (see, for example, [Mansilla & Mateo- 
Reig, 1995]). 

In [Li, 1991a] a behaviour of the correlation function of the form y is assumed , where 

/ d 

d is the distance between symbols in the sequence and c is supposed to depend on the 
probability p of mutation. In [Chatzidimitrou-Dreismann & Larhamar, 1993] a behaviour 

of the form d^^ is proposed for a magnitude inversely proportional to correlation 
function. In [Buldyrev et al., 1995] and [Li & Kaneko, 1992] the correlation function for a 
group of sequences of GENBANK is calculated and the behaviour of these functions 

suggests the existence of more than one exponent, e.g. a behaviour of the form ) y . 
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The results of simulations in [Li, 1989] also suggest the above mentioned behaviour. 

The aim of this paper is to prove that for the model proposed in [Li, 1991a], the correlation 

LK / 
V m (a\ ■ We obtain the constants K ; and 

i 

asymptotic upper and lower bounds for the functions q>j(d). We also prove that one of 
these exponents has a more important contribution to correlation functions than the others 
and show its fitness with respect to those obtained in simulations. 

The structure of this paper is as follow: In Sec. 1 some preliminary definitions are reviewed. 
In Sec. 2 the fundamental mathematical results are developed. Among them the most 
important are Eq. (2.6) and Eq. (2. 1 1). In Sec. 3 upper and lower bounds for the eigenvalues 
of the matrix M(i k _i,i k ) which appear in Eq. (2.6) are obtained. As stated in (3.10a) and 
(3.10b) those bounds depend on the sum S(d,n) and the product M(d,n) of elements of 
some multiindex. In Sec. 4 upper and lower bounds for S(d,n) and M(d,n) are obtained. 
In Sec. 5, for 8 j(ii,...,i n ) defined in (2.12), uniform upper and lower bounds on sets 

G(d,n) are obtained. From this result, asymptotic expressions are obtained for the exponent 
in correlation function. In Sec. 6 the exponent of the major contributing term is calculated, 
and this theoretical exponent and that obtained from simulations are compared in a plot. In 
Sec. 7 the results are discussed and Sec. 8 is for conclusion. 



1 Some Preliminary Definitions. 

The evolution of prebiotic nucleotide sequences in an instance in which two competing 
processes play an important role in determining the statistical properties of the sequences. 
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Among the group of modifications that DNA sequences suffer, the replications and point 
mutations are, in some sense, antagonistic. Replications add strings of chain in other sites 
creating long range correlation, while point mutations tend to destroy them. If the prebiotic 
evolution contained only replications, the limiting sequence would be periodic; if the point 
mutation rate was too high, the limiting sequence would be random. Only when the two 
processes are in an appropiate balance, can the nucleotide sequences show nontrivial long 
range correlation as observed in Nature. 

In [Li, 1991a] a model is proposed which grasps the main features of this processes. Let 
x l =---CCQa{--- be a binary sequence. It is mapped in x t+l =---CCQ +l a{ +1 --- using the 

following rules: each symbol a\ changes to two identical symbols with probability 1- p 
and switches to the other symbol with probability p , i.e., 



A particular realisation of this rewriting process is shown in Fig. 1 of [Li, 1991a]. 

We borrow some definitions. Let ^(d) be the joint probability for having the symbol 

pair a , jS separated by the distance d . 





Assuming that the transition probability from an a , /3 pair initially at distance d to an 



a , (3 pair at distance d is T(a ,/3 ,d — >a,/3,d), the joint probabilities satisfy the 



following dynamical equation: 
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^0,0 V) 




Po'lv) 


d 

- I 


P(t\d)_ 


d =[d/2] 





T(00 00) 



r(oo^ii) 



r(ii^oo) 



r(ii^ii) 



P{o(d) 
P{l(d') 
P{ (d') 

P{l(d) 



(1.1) 



here we have written T(a ,f5 — >a,/3) instead of T(a ,/3 ,d — >a,/3,d) for simplicity. 

From now on, the square brackets [.] stand for integer part of number. 

The transition probabilities T(a ,/3 ,<i — > a,fi,d) can be grouped into three types: 

t 

T$(d ,d,p): Keep both symbols unchanged, for instance: T(0,0 — > 0,0) . 
T\(d ,d,p) : Change one symbols, for instance: r(0,l — > 1,1) . 

r 2 (J ,d,p) : Change both symbols, for instance: 7(1,1 — > 0,0) . 

Hence the Eq. (1.1) can be written: 

d 



p t+ \d)= ^r(j',j,p)/ 3f (j') 

d'=[d/2] 



where: 



P\d) 



T(d ,d,p) 



Pioid) 




P{l(d) 




Plfiid) 




P{l(d)_ 




T x T x 


T 2 


To T 2 


n 


T 2 T 


n 


T x 7} 


To 



(1.2) 



In the above matrix we have written T s instead of T s (d ,d,p) for simplicity. 
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Suppose there is a time invariant condition in the t — > +00 limit and the superscript can be 
dropped. Then we can write Eq. (1.2) in the following form: 

d 

p(d)= £r(j',j,p)P(j') (1.3) 

d'=[d/2] 

or: 

d-1 

p(d) = a-T(d,d)y 1 £r(j',j,p)P(j') (1.4) 

d'=[d/2] 

In Sec. 2, starting from Eq. (1.4) we obtain a closed expression for P(d) . 



2 Some Fundamental Mathematical Results. 



Definition 2.1: Let d > n > 1 be integers. Let G(d,n) be the set of elements of the form 

O'l ,...,/„) g {l,...,<i} M which hold the following conditions: 

a) ii = 1 , i n = d . 

b) i l <i 2 <...<i n . 

c) For every l = l,...,n- 1 : [z'/ + i / 2] < // . 



Examples of such sets are: 

G(3,2) = {(1,3)} 
G(4,3) = { (1,2,4), (1,3,4)} 
G(d,d) = {(l,2,...,d)} 
G(4,2) = 
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In general: G(d,2) = for every d > 4 . To prove this, let us note that G(d,2) = {(l,d)} 
and as the condition c) of Definition 2. 1 must be hold, then [ d 1 2] < 1 , which is only 
possible for d < 3 . 

Definition 2.2: Let d>k>n>lbe integers which satisfy the following conditions: 

a) [d/2]<k. 

b) /i>[log 2 *] + l. 

Let us denote (k,d) AG(k,n) the set of elements of the form (ii,...,i n ,d) such that: 

(il,...,i n ) e G(k,n) . 

Lemma 2.3: Let d > n > 1 . Denote: 

u(d,n) = min{c)f - 1,2" - 1} 
l(d,n) = max{[d I 2],n} 



Proof: Consider (ii,...,i n ,d) e (k,d) AG(k,n) . It implies that k>n from the Definition 
2.1; [d 1 2] < i n = k from condition a) of Definition 2.2; k <d - I from Definition 2.2 and 

k<2 n -I from condition b) of Definition 2.2. Hence we have l(d,n) <k <u(d,n) . 
Besides (/]_ ,...,/„) e G(k,n) . Let us show that (ii,...,i n ,d) e G(d,n + 1) . Obviously 

(iy,...,i n , d) g {l,...,J} n+1 ; z'i = 1 because (/]_ ,...,/„) e G(k,n) and also i n+ i=d. The 
above guarantees condition a) of Definition 2.1. Besides ij <...</„ because 
(il,...,i n ) g G(k,n) and k = i n <d. Hence, condition b) of Definition 2.1 also holds. 
Lastly, condition c) of Definition 1 holds for / = l,...,n-l because (« 1 ,...,z n ) e G(k,n) . 



Then: 



u(d,n) 




(2.1) 



/(J,n) 
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From condition a) of Definition 2.2, condition c) of Definition 2.1 holds for I = n . Hence 
(il,...,i n ,d) e G(d,n + 1) . Let us see the opposite inclusion. 

Let {i\,...,i n+ \) € G{d,n + 1) . It implies that i n+ \ = d . Let k be the element i n . From 
condition c) of Definition 2.1 we have [d / 2] < k , therefore, condition a) of Definition 2.2 
holds. Obviously k < d - 1 . On the other hand, k > n because l = i\ <...<i n = k . Let us 

prove that (ii,...,i n ) e G(k,n) . First, (ij ,...,i n ) e {1,..., k} n because 

(i\,---,i n +\) g G(d,n + l) . Consequently condition a) of Definition 2.1 holds. As we have 

said condition b) also holds. The condition c) of Definition 2.1 is true for l = \,...,n-\ 

because once again (/ 1 ,...,z n+1 ) e G(d,n + 1) . All the above implies that 

(il,...,i n ) g G{k,n) and therefore n > [log 2 k] + 1 . Hence l{d,n) <k< u{d,n) . 

As (ii,...,i n ) e G{k,n) then {i\,...,i n+ i) e (A:, d) a G{k,n) for certain . But this implies 

that: 

u(d,n) 

G(d,n + l)cz \J(k,d)AG(k,n) 

l(d,n) 

This completes the proof. 



Remarks: 

1) If u(d,n) = 2" - 1 , then for 2" - 1 < k < d - 1 we have G(k,n) = . It is not difficult 
to prove that G(k,n) ^ if and only if n > [log 2 k] + 1 . Hence, if 2" - 1 < k we have 
G(£,«) = 0- 
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2) Let us note that it is not possible u(d,n) = 2 n - I and l(d,n) = n.Ifd-l>2 M -l then 

[d 1 2] > 2 n ~ l , but 2 n ~ l >n for n > 2 , therefore it is impossible that [d 1 2] < n . 

From the above remarks, we obtain that the following equation: 

d-1 

G(d,n + l)= \J(k,d)AG(k,n) (2.2) 

k = n 

is also true. 

Corollary 2.4: Let: 0(d,n) = cardG(d,n) . Then we have: 

u(d,n) 

9(d,n + \)= (2.3) 

k = l(d,n) 

Proof: It is straightforward from Eq. (2.1) and the fact that sets G(d,n) do not intersect 
each other. 

Remark: From the remarks below Lemma 2.3, the following expression: 

d-1 

G(d,n + l)=Y,d(k,n) (2.4) 

k=n 

is also true. 
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TABLE I 
Values of cardinals from sets G(d,n) 



rl\n 


Zi 


•3 




c 


A 

u 


7 


o 
o 


Q 


10 

AU 


1 1 

A A 




13 


"1 

At 


1 
A 
























•3 


1 
A 


1 

A 






















A 
<4 






1 




















C 

J 


u 






1 

A 


















o 


A 

u 


1 




A 

4 


1 

1 
















7 





1 


6 


9 


5 


1 
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6 


15 


14 


6 


1 












9 








6 


21 


29 


20 


7 


1 










10 








4 


26 


50 


49 


27 


8 


1 








11 








4 


30 


76 


99 


76 


35 


9 


1 






12 








2 


31 


105 


175 


175 


111 


44 


10 


1 




13 








2 


33 


136 


280 


350 


286 


155 


54 


11 


1 


14 








1 


30 


165 


415 


630 


636 


441 


209 


65 


12 



Table I show the values of 0(d,n) for 2 < J < 13 and 2 < n < 13 . It can be proved (see 

Appendix A for the details) that: 

a/J h ^ (2d-k-3) (d-4) 

0(d,k)< — (2.5) 

k - 3 yd - kj 



Let us denote by: 1(d) = (I - T(d ,d)) and M(d ,d) = I(d)T(d ,d) . Then we have the 
following: 

Theorem 2.10: For every d e N , d > 2 : 



P(d) 



U=2 



)eG(d,n) 



M(i n _ l ,i n )...M(i l ,i 2 ) 



TO (2.6) 



Proof: It will be by induction on d . The property is true for d = 3 : 

P ( 3 ) = {l (ll , l2)6G(3 . 2) M <'..'2) + i: (l , 1 , l3)6G(3 . 3) M <'2.'3)^('l.'2)K(l) 



Let suppose that it is true for k = 2,...,d - 1 and prove that it is also true for k = d 
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Then: 



d-1 

P(d) = I(d)Y^T(k,d)P(k) 

k=2 



d-l 



= I(d)Y^T{k,d) 

k=2 



II 

n=2 



(i'l ,...,i n )eG(k,n) 



M(i n _ l ,i n )...M(i l ,i 2 ) 



d-l 



EE^tE 



lk=2 

Let us denote by: 



n=2*-^(i l ,...,i n )eGk,n) 



M(k,d)M(i n _ l ,i n )...M(i l ,i 2 ) 



P(l) 



P(l) (2.7) 



o(k,n) = £ )GG(JkiB) M{k,d)M{i n _ l ,i n ). . . AfO'i ,*2 ) 



Then it is not difficult to see that: 

d-l , 



d-l 



k=2 n=2 



Hence Eq. (2.7) can be written as: 

\d-l 



P(d) = 



L*k=nL*(ii,...,i 



n=2 



i l ,...,i n )eG(k,n) 



M(k,d)M(i n _ l ,i n )...M(i l ,i 2 ) 



P(l) (2.8) 



Now the multindex of the product M(k,d)M(i n _i,i n )... M(z' 1 ,z 2 ) is (i\,...i n ,d) . From 
Lemma 2.3 we have: (ii,...i n ,d) e (/c,J) a G{k,n) a G(d,n + 1) . From Eq. (2.2) we can 
write: 

)...M(/ 1 ,i 2 ) 

J„+i)eG(<i,n+l) k=n(ii,...i n )eG(k,n) 
Therefore, the Eq. (2.8) can be written: 
[d-1 



P(d) = 



lie 



Ln=2 



/j ,.../ n+ i)eG(ii,n + l) 



M(i n ,i n +i )• • • M(ii,i 2 ) 



P(l) 



Now making the change of variable n = m - 1 the above expression can be written as: 



P(d) = 



EE 



m=3 



(il,...i m )eG(d,m) 



M{i m _ l ,i m )...M{i l ,i 2 ) 



P(l) 
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We could add the term for m = 2 because G(d,2) = for every d > 4 and the inner sum 
would be zero. Finally: 

' d 



P(d) = 



n=2 



.i n )eG(d,n) 



M(i n _ l ,i n )...M(i l ,i 2 ) 



P(l) 



where we have changed m by n . This completes the proof. 
Remarks: 

1) Some terms of Eq. (2.6) are equal to zero. As we have pointed out G(d,n) ^ if and 
only if n > [log 2 d] + 1 . Besides: 

^2 M(i n _i, i n )... M(z' 1 ,z 2 ) = if n < [log 2 d] + 1 because in each product 

(il,...i n )eG(d,n) 

M(i n _i,i n )...M(ii,i 2 ) there exists at least a couple i r _\,i r which does not satisfy 
condition c) of Definition 2.1 and therefore M(z' r _ 1 ,z' r ) = . This explains why we could 
add the term for n = 2 at the end of the proof of Theorem 2.10. Hence the following 
expression remains true: 

d 



P(d) = 



E H(i u ...i n )cG(d,n) 
n=[log 2 d]+l 



M(i n _ l ,i n )...M(i l ,i 2 ) 



P(l) (2.9) 



2) Note that P{d) depends on P(l) . In sequences of four symbols (as DNA sequences), 
P(l) is related to the dimers structure. As we have shown in [Mansilla et al., 1993] and 
[Mansilla & Mateo-Reig, 1995] P(l) distinguishes the non coding regions of DNA 
molecule. That is why we use it in [ Mansilla & Mateo-Reig, 1995] as fitness function for 
an evolutionary model studied there. 

t 

Our next step is to study the structure of matrix T(d ,d,p) . It can be proved that: 



T(d ,d,p) = 



where: 



To 


Ti 


Ti 


T 2 


Ti 


To 


T 2 


Ti 


Ti 


T 2 


To 


Ti 


T 2 


Ti 


Ti 


To 




*2 


= T 


~T 2 



= Q 



K X 

n 2 o 

n 2 

7i 3 



Q 1 = QDQ 



-l 



(2.10) 
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Q 






1 



1 " 

-1 -1 

1 -1 

1 



1/4 
-1/2 




1/4 


-1/2 



1/4 


1/2 



1/4 
1/2 




1/4 -1/4 -1/4 1/4 



In the same way: 



1(d) = Q 



(1-Vi)- 1 








(l-v 2 ) 





-1 






(l-v 2 ) 




-1 









e" 1 = ese -1 



(l-v 3 ) 

where v 1 ,v 2 ,v 3 are the eigenvalues of the matrix T(d,d,p) . Using the above expression, 
Eq. (2.6) can be written as: 



P(d) = Q 



E I 



(il,...i n )eG(d,n) 



A( z «-l ' Z n 



n=[log 2 c/]+l 

where: A(i k _ l3 i k ) = Z(i k )D(i k _ 1 ,i k ) 
Denote by: 



>Q- l P(\) (2.11) 



where: 



H(d) = 



n=[log 2 <i]+l 



n )eG(d,n) 



n—l' l n 



H(d) 



H x {d) 

H 2 (J) 

# 2 0i) 

// 3 (J) 



n=[log 2 c/]+l 



(il,...,i n )eG(d,n) 



8 k (ii,...,i n ) 



8 k (ii,...,i n ) - 



(l-v(i„))...(l-v(i 2 )) 



(2.12) 
(2.13) 
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Proposition 2.11: Let =---a_ 1 a tt i tt 2' - ' an infinite string of binary symbols. Let us 
denote by Pq(&), P\(&) the densities of zeros and ones respectively in string . Then if 
/>!(©) *l/2 we have: H 2 (d) = l. 

Proof: We will use some properties of probabilities P a p(d) in strings of binary symbols 

(see for example [Li, 1990]): For every d > 1 : 

P ,l(d)=P h0 (d) ; P , (d) = l-2P 1 (©) + P hl (d) ; P\$(d) = P\(&) - P\\ (d) 
It is not difficult to see that: 



P0,0(d) 
P ,l(d) 
P lfi (d) 
P hl (d) 



1 



MiP^QXP^il)) 



H x {d) 
H 2 (d) 
H 3 (d), 



(2.14) 



M(P 1 (0),P U (1)) = 



1-4P 1 (0) + 4P U (1) 
-(1-4P 1 (0) + 4P U (1)) 
-(1-4P 1 (0) + 4P U (1)) 



where: 

"1 2(1-2^(0)) 
1 
1 

1 -2(1-2^(0)) 1-4P 1 (0) + 4P U (1) 
Now because of P 00 (d) = l-2P 1 (0) + P l x (d) , from Eq. (2.14) we have: 

(l-2P l (®))H 2 (d) = l-2P l (®) 

This completes the proof. 

From Eq. (2.1) of [Li, 1990] and Eq. (2.14) we have: 

r(J) = // 1 (J) + (l-4P 1 (0) + P u (J))// 3 (J)-( J P 1 2 (0)-4P 1 (0) + 2) (2.15) 
where T(d) is the correlation function. Hence we have obtained an expression for 
correlation function based on the eigenvalues of the matrix M(i k _i,i k ) . Later we will 
obtain upper and lower bounds for H\{d) and H^id). 
3 Upper and Lower Bounds for Eigenvalues. 

In this section we will obtain upper and lower bounds for v k (d) and 71 ^(d ,d) as 

functions of d,d and p. First, we will give some definitions which can be seen in [Li, 
1991a]. There, in Fig. 10 all the possible cases in which two binary symbols, previously 
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separated at distance d would be at distance d in the next time step, are shown. These 
cases are labelled as , A 2 , A 3 , B± , B 2 and C . Their probabilities are (see Eq. B5 of 
Appendix B of [Li,1991a]): 



P(A l ) = (l-pY 



P(A 2 ) = (l- P y 



P(A 3 ) = (l-pY 



d -1 

2d'-d + l 



p 2d -d+l^_ p^d-d -2 



J 



d -1 
2d -d 



p 2d -d {l _ p) d-d -1 



d -1 
2d -d-\ 



p 2d -d-l (l _ p) d-d 



P(B l ) = p(l-p) 



2d -d 



p 2d -d {l _ p) d-d -1 



P(B 2 ) = p(l-p) 



d -1 
2d' -d-l 



p 2d ~d-l (l _ p) d-d 



P(C) = p' 



d -1 
2d -d-l 



p 2d -d-l {l _ p) d-d 



(3.1a) 



(3.1b) 



(3.1c) 



(3.1d) 



(3.1e) 



(3.1f) 



The coefficients of matrix T(d ,d,p) can be built in terms of them (see also Appendix B): 

T (d,d,p) = P(A 1 ) + 2P(A 2 ) + P(A 3 ) (3.2a) 

T l (d',d,p) = P(B l ) + P(B 2 ) (3.2b) 

T 2 (d\d,p) = P(C) (3.2c) 
Now from the expressions (2.10), (3.2a), (3.2b), (3.2c) we have: 

% x {d\d,p) = p 2d ~ d ~ X (\- p) d ~ d \A{d ,d)p 2 +2B(d' ,d)p + C(d' ,J)} (3.3a) 

% 2 {d ,d,p) = p 2d - d - l (l- p ) d - d ^A(d',d)p 2 +2B(d' ,d)p(l- p) + C(d' ,J)(l-2p)} 
(3.3b) 

K 3 (d ,d,p) = p 2d - d - l (\- p ) d - d ^A(d ,d)p 2 +2B(d ,d)p(l-2p) + C(d ,d)(l-2p) 2 } 
(3.3c) 



15 



where: 



A(d ,d) 
B(d',d) = 
C(d',d) = 



d'-l 



2d -d + l 



d -1 
2d -d 



d -1 
2d -d-\ 



t 

It is not difficult to see that if d =d, then the only possible cases are A 3 ,5 2 y C 

t t 

Therefore the eigenvalues of matrix the T(d ,d ,p) are: 



v\(d ) = p 



d -1 



v 2 (d') = p d -\l-2p) 



(3.4a) 
(3.4b) 

v 3 (d') = p d -\l-2p) 2 (3.4c) 
Theorem 3.1: Let 7l\(d ,d,p),7lj(d ,d,p) be as defined by Eqs. (3.3a) y (3.3c). Then if 

t 

d is large enough and < p < 1/2 we have: 

(d-\)(\-p) 



<Pl(p)~ 



(PI(P)~ 



2p 



{d'-\)(\-p) 
2p 



<Tt x {d ,d,p)<<p\{p) 



4d^ 



<K 3 (d ,d,p)<(p 3 u (p) 



(d - 


1)p 


e 


-P) 


4d - 


-1 


(d- 


-\)p 


e 2 ^ 



-p) 



(3.5a) 



(3.5b) 



where: 



, e 2>l -l>U(l-p) 2 e 2 "+2(l-p) 

f,(p)= 7S5o^ 

1 eP +(l-p) 2 e 1 -P +2(1- p) 

(Pu(p)= — im^T) — 
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3, . e 2(1 ~ p) +(l-2p) 2 e 2p -2(1-2/?) 
(Pl(p) = 



<Pu(P) = 



pnp(l-p) 
e p +(l-2p) 2 e 1 ~ p + 2(1-2/;) 



^2np{\-p) 
Proof: 

Let <i G N such that for every d > d Q , the approximation of Local Limit Theorem remain 
valid: 

\2 



(d\ 



p r (l-p) 



d-r 



1 



r—dp 



,]dp(l-p) 



J2np(l-p)d 

( J could be 25, see page 84 of [Gnedenko, 1980]). Then from (3.3a) , (3.3c) and under 

t 

supposition that d > J > we have: 



K X = 



w 



42np{\-p) 



2wp(\-p) , 



w+Zu 
\X\-p)j 



+(i-pye 



71-. 



42np{\-p) 



2w(l-p), 



v2d-p)y 



+ (l-2p) z e 



vv— 2m 
v2d r p)y 



^ w+2k a 
v2(l"P)y 



+ 2(l-/>) 



(3.7a) 



-2(l-2p) 



(3.7b) 



where: 



w = 1 



2d -d 



1 



p(d -1) p(d -1) 

From condition c) of Definition 2.1 we have d +l<d<2d +1, hence: 



1 2d -d 1 



/?(d -1) p(d -1) P 



and then: 



2p < e 



^ w-2m A 
V2d-P)y 



3 
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,2(1- p) <, 



( w+2u A 
2(1-/5) 



From the above inequalities and (3.7a) and (3.7b) we could obtain: 



g 2w(1-/j) , g 2w(l-p) 

<P/(P) — r=, <ni(d ,d,p)<(p u {p)- 



4d -1 

2 

M 

~2w(l-p) 
I/O?) < 



4d -\ 

2 



n 3 (d ,d,p)<(pl(p)- 



1 



2w(1-/j) 



Let us note that: 



2w(l-p) 2(1 -p) 



1- 



2d — d 
P(d'-l) 



It is not difficult to prove that if J + 1 < d < 2d + 1 , then: 



1< 



2d -d 



< 



l-p 



(3.8a) 



(3.8b) 



p(d -1) P 
The above condition , (3.8a) and (3.8b) complete the proof. 

Remark: It is easy to see that q> t (p) > if 0.0716 < p . In all that follows we suppose that 
p satisfies the above mentioned condition. 

Definition 3.2: Let (ii,...,i n ) e G(d,n) and J G N . Let us denote by l(ii,...,i n ) the set 
of indexes which are smaller than cIq and by u{i\ ,...,i n ) those which are bigger: 

l(i l ,...,i n ) = {i r e(i l ,...,i n ):i r <d } 

u(ii,...,i n ) = {i r e (ii,...,i n ):i r > d Q ] 
Definition 3.3: Let (ii,...,i n ) e G(d,n) . Denote by: 

* jiik'ik+l'P) 



c j (i l ,...,i n )= {J . 

Corollary 3.4: Under conditions of Theorem 3.1 we have: 

Lj(p,d ,d,n) < 8 j(ii,...,i n ) < U j(p,d ,d,n) 



j = 13 



(3.9) 
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where: 

—S(d.n) 

e 2p 

Lj(p,d ,d,n) = ct(ii,...,i n )®i (p,d ,d,ri) — — — — - — (3.10a) 
J J M(d,n) 

— S(d,n) 

U j(p,d Q ,d,n) = c j(ii,...,i n )<& J u (p,d ,d,n) — (3.10b) 

J J M(d,n) 



1 

l(h,...,i n ) 



<f> J l (p,d Q ,d,n)= | J - 

- l-v_/(ijfc) 



S(d,n)= ^Ofe-1) ; M(d,/i)= n^ _1) 

u{i l ,...,i n ) u{ii,. ..,i n ) 

Proof: The expression 8 j .,i n ) can be written: 



S 

therefore: 



<'i '«> = H , ,. • 1 , ,. r 

/(^\) 1 - v ^ , *+i>«a 1 !.. A ,i ll ) 1 - v / , *+i) 



5j(i l ,...,i n ) = c j (i l ,...,i n ) J — - 



Now from (3.5a) and (3.5b) we have: 



P S(d,n) 



P -S(d,n) 



H(P,d ,d,n) — — > ]| - 



The above inequalities complete the proof. 
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4 Upper and Lower Bounds for Sum and Product of Indexes. 



In Corollary 3.4 we obtain upper and lower bounds for 8 j(i\,...,i n ) . Those expressions 

depend on S(d,n) and M(d,n) . In this section we will obtain upper and lower bounds for 

S*(d,n) = Y, s „ fJ ,0V -1) and M*(d,n) = T\ f s „ tJ Mu -1). 

which together with (3.10a) y (3.10b) will be used to obtain uniform bounds for 

8j (ii,...,i n ) on G(d,n). 

* * 
The expression S (d,n) as well as M (d,n) reach their maximum values in those 

members of the set G(d,n) in which the last elements are consecutive, e.g., 

i n = d ,i n -i = d - 1 , i n _2 =d-2, etc. This condition constrains to the first ones to be as 

sparse as possible, but fulfilling the condition: [z'jt+i/2] = , e.g.: 

Therefore we should find an index r such that: 

d — r 



■ s ifi — f — 1 s ifi — ? • • • ? ifi ) (1?3,...,2 1,6? r,...,tt) 



< 2 n-r-l _ ] _ 



2 n-r-2_ l< 



d-r-l 



(4.1) 



If d - r is even, the above conditions are equivalent to: 

d + 2 <2 n ~ r +r 
|2"- (r+1) +r + l<d 

If d - r is odd, conditions (3.1 1) are equivalent to: 

d + \<2 n ~ r +r 
[2 n " (, " +1) +r + l<J + l 

In order to obtain such an index r we will solve the equation: 

2 n ~ x +x = d + l 



(4.2) 



(4.3) 



(4.4) 



In Fig. 4.1 it is shown the graph of difference between solutions of equations: 
2 n ~ x +x = d ■ 2 n ~ x +x = d + 2 
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as function of d . This justifies the search of index r using (4.4). 




Fig. 4.1: The difference between the solutions of 2" x + X = d and 2" x + JC = J + 2 . In 
the x axis is shown the distance d . 
It can be shown that the only feasible solution of (4.4) can be expressed by means of the 
following series expansion: 



x 



= E 



1 (ln2) fc P k (d) 



^^■{dlnl-l) 



2k -1 



( hid 

n 

^ ln2y 



(4.5) 



where P k (d) is a polynomial in d of degree k - 1 . The expression n - is always 



In 2 



positive because n > [log 2 d] + 1 . 
The above series could be written as: 

P k (d) 



x 



^^■(dXnl-X) 



k-\ 



r n\n2-\nd^ k 



d\nl-\ 



Let us note that: 



Q< n\nl-\nd < d\n2-\nd 
Jln2-1 ~ Jln2-1 
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and it is easy to see that: 



lim 



P k (d) 



d^(d\n2-l) k In 2 



Therefore, as solution of (4.4) we will use the approximation done by the first term of series 
(4.5): 



r u = 



d In 2 ( In J 
n 



Jln2-1 



V 



In 2 



(4.6) 



We want to remark the accuracy of this approximation. In Table II are shown some couples 
of (d,n) , the exact values of n — r and the approximation obtained from (4.6) (labelled 
n-r u ). 

From the result obtained above we could have upper bound for S*(d,n) y M*(d,n): 



S (d,n) < S u (d,n) 



M (d,n)< M u (d,n) (4.7) 



where: 



1 

S M (d,n) = d ^ +2J(n + 2-— ) + — (2/i - 1 + — — ■ ) (4.8) 

In 2 In 2 In 2 



(n-r M )(n-r M -l) 
M M (d,n) = 2 2 



(J-r M -l)! 



(4.9) 
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TABLE II 

Exact values of n - r and their estimates n - r u 



d 


n 


n — r 




30 


10 


5 


4.64 


30 


20 


4 


4.14 


33 


11 


5 


4.77 


60 


20 


6 


5.55 


60 


40 


5 


5.06 


62 


7 


6 


5.92 


80 


13 


7 


6.19 


81 


18 


7 


6.12 


120 


40 


7 


6.50 


120 


80 


6 


6.01 


240 


80 


8 


7.47 


240 


160 


7 


6.98 



We proceed in the same way for lower bound. The expression S (d,n) as well as 

* 

M (d,n) reach their minimum values in those members of the set G(d,n) in which the 
last elements are as sparse as possible. This condition constrains the first ones to be 
consecutive, because the condition i k < , k = 1,.. .,n - 1 must be fulfilled. 

(i\ , . . . ,i r , — ,ifi ) — (1, . . . , f, 'V+l ' — i^n ) 
<i[ ; / = r + l,...,n - 1 . 



where: 



Therefore, we should find an index r such that: 



,n-(r+l) 



(r + l)< J + l 



J + l<2 



n — r 



(4.10) 
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To obtain such an index, we will solve the equation: 



2 x (n-x) = d + l 



(4.11) 



It can be proved that the only feasible solution of (4.11) is expressed by means of the 
following series expansion: 



x 



=i 1 (ln2) lV -*>* 



(4.12) 



where P k (d) is a polynomial of degree k -I. More precisely: 



P k (d) = (k-l)\(\n2) k ~ 2 d k ~ 1 +...+d 



Besides, it is not difficult to see that: 



and: 



Q< Jln2-nln2 < dln2-\nd < 
rfln2-l ~ Jln2-1 



lim 



P k (d) 



d^™(d\n2-l) k ln2 



Hence, for d large enough: 



x 



1 °° 

— E- 

k = l 



if d In 2 - n In 2 



Jln2-1 



ln2 



-In 



Jln2-1 
nln2-l 



(4.13) 
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0' 200 400 " 600 " 000 1000 

x 



Fig 4.2: The graphs of the whole series (4.12) and the approximation (4.13) for n=20. In the x 
axis is shown the distance d . 
We will take as approximation to solution of (4.11) the expression (4.13). In Fig. 4.2 we 
present the graphs of the whole series (upper plot) and the approximation (4.13) (lower 
plot) for d = 1024 and 11 < n < 1000. In Table HI we present some couples of (d,r) , the 
corresponding values of r and the approximation obtained from (4.13), ( labelled r t ). 



TABLE III 





Exact values of de n-r 


and their estimates 


n-r t . 


d 


n 


n — r 


n-ri 


30 


10 


9 


8.26 


30 


20 


20 


19.37 


33 


11 


11 


9.27 


60 


20 


19 


18.34 


60 


40 


40 


39.39 


62 


7 


4 


3.55 


80 


13 


11 


10.23 


81 


18 


16 


15.73 


120 


40 


39 


38.77 


120 


80 


80 


79.40 


240 


80 


79 


78.34 


240 


160 


160 


159.41 
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From all above: 

S*(d,n)>S[(d,n) ; M*(d,n)> M[(d,n) (4.14) 



where: 

(n-r,Vn-r, _ n _ ?.w 17/7 - »A In ? - 1 1 1 

+ 1 (4.15) 



c x (.n- ri )(.n- ri -l)-2n A(d-n)\n2-l\ 1, 

2 M h,2-i r^" 1 



Jln2-1 



M l (d,n) = d(n-r l -l)\(d + \y i 2 2 e d+l{ ' (4.16) 



5 Uniform Bounds on Sets G(d,n). 

In this section, starting from (4.15) and (4.16), we will obtain upper and lower bounds for 
8 j{i\,...,i n ) as functions of d and n on every set G(d,n). 

Definition 5.1: Let us denote for: 

c u j(d ,n)= max cj (i l5 ...,i„) 
(i\,...,i n )eG(d,ri) 

c l j(d ,n)= min c j(i x ,...,i n ) 
(i l ,...,i n )eG(d,n) 

Proposition 5.2: 

The following inequalities hold: 

_l-p 

e 



2p 



L:(p,d ,d,n) > c l j(d ,n)®j (p,d ,d,n)Q l (d ,n) — — (5.1) 

M u (d,n) 



where: 



Uj(p,d ,d,n) < c"(J ,n)$^(p,J ,J,n)e M (J ,n) (5.2) 

v 7 Mi(d,n) 



Q l (d ,n)= min W (ij,. ..,/„) 
(il,...,i n )eG(d,n) 

Q u (d ,n)= max W M (ii,...,i'„) 
(il,...,i n )eG(d,n) 
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w i a l ,...,i n )= nv^ e Wn) 



l{i x ,...,i n ) , k±\ 



P L(it-D 



/(*!,. ..,i n ),k±\ 



Proof: From (3.10a) and (3.10b) we have: 

-^S*(d,n) 

It I e 2p 

Lj(p,d ,d,ri) > Cj(d ,n)^>j (p,d ,d,ri)W (i l5 . ..,/„) 



Af (</,n) 

5 (d,n) 

e 2(1 ~ p) 

U j(p,d ,d,n) <c'j(dQ,n)^> J u (p,dQ,d,n)W ll (ii,...,i n )- 



M*(d,n) 



From the above inequalities and expressions (4.7) and (4.14) we have (5.1) and (5.2). 
Proposition 5.3: If 0.08 < p < 0.25 , then for d large enough and j = 1,3 : 



1 P d ^ 
n 21n2+— +' 



n 



<t>i(p,d ,d,n)<e ^ P 1 P \2n(\-p)p) 2=B u (p,d ,n) (5.3) 

_I -J ln5 

&{(p,d ,d,n)>(2jc(l-p)p) 2 e 2(.l-p) = Bl (p,d ,n) (5.4) 
Proof: If p < 0.25 , we could prove that. 

1 _ 1 

(pi(p)<4eH27l(l-p)p)~2 

Besides, from: 

1-/ =e Ml- P k ) =e -p k +o( P k ) 
we have, for d large enough: 



_^)> e -(/ + ...VH O (/°) =e "l^ ° (/? } 

«(*!,. ..,i„) 
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Therefore: 



<& J u (p,d ,d,n) < e 
If 0.08 < p , then: 



f x ,V 

21n2+-+- 

P ] ~P 



1 1 



(2n{\-p)p) 2 



-(2n{\-p)p)^e 1( ^-P^ <<p{(p) 



and from the above: 



1 1 



— In 5 



(2n(l-p)p) 2 e 2(l- P ) <<t>j( p ,d ,d,n) 
This completes the proof. 



Proposition 5.4: Under the same conditions of Proposition 5.3 we have: 



52 <5 j (i'i,. ..,/„) <K u (p,d ,n)- 
(il,...,i n )eG(d,n) 



J S;(d,n) 

2(1-/?) (2J-n-3) ^~4^ 

v n-4 y 



52 5 j(ii,...,i n )> K l (p,d ,n)- 



M t (d,n) 

1-P 

2p 



n-3 



(5.5) 



5„(J,n) 



{il,...,i n )eG{d,n) 
where: 



M u (d,n) 



(5.6) 



K u (p,d Q ,n) = c" (d Q ,n)B u (p,d ,n)Q u (d ,n) 
K l (p,d ,n) = c l j(d ,n)B l (p,d ,n)Q l (d ,n) 

Proof: 

Let (ij,. ..,/„) e G(d,n) . Then from Corollary 3.4 and Propositions 5.2 , 5.3 we have: 

P 



8 j(ii,...,i n )< K u (p,d ,n)- 



2(1-/,) 



Si(d,n) 



Mi (d,n) 



(5.7) 
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8 j(ii,...,i n ) > K l (p,d ,n)- 



2p 



S u (d,n) 



(5.8) 



' ' " " M u (d,n) 

From (5.8) we obtain (5.6) straightforward. From (5.7) and expression (A. 9) of Appendix A 
we obtain (5.5). This completes the proof: 
The expressions (5.5) and (5.6) can be written as: 

Y,Sj(h,...,i n )>c l j(d ,n)Q l (d ,n)e- £ ^ d ^^ (5.9) 
(?!,.. .,i n )&G(d,n) 

-e u (d,n,p) 



Y^SjQi,. ..,/„) < c l j(d ,n)Q u (d ,n) e 
(il,...,i n )eG(d,n) 



(5.10) 



where: 



e u (d,n,p) = £ l u (d,n,p) + £l(d,n,p) + £l(d,n,p) + £u(d,ti,p) (5.11) 



£ l u (d,n,p)= P (d + l) 



+ In 

In 2 



2(1 -p) 
Jln2-1 



(J-n)ln2-l 



Jln2-1 



+ 



n In 2 - 1 



ln( d + 1) • 



1 



In 



Jln2-0 



n\n2 — 1 



+ ln2 



1-p 



el(d,n,p) = (n-r l -l)^^ + ln(n-r l -l)-lj + ^-(l-n) + 



+(d -n)\\n(d -n) + 



d + l 



In 2 



nlnl-l) 



£ 3 u (d,n,p) = \nd + hn(n- ri - 1) + ^ n) + ln(n - 4)(n - 7 / 2) + ln(n - 3) 



£ u (d,n,p) 



ln(J-4) 



£ t (d,n,p) = 



+ (J-4)ln(J-4) + ln(2J-n-3) + n 

12 3 
£l(d,n,p) = £i(d,n,p) + £i (d,n,p) + £i(d,n,p) 

l-p 



f i /o A 
21n2 + - + -^ — 
p l-p 

(5.12) 



1 

d d]nd + 2d 







In J 


( 


, ln<0 


n + 2 


+ 


2n — 


1 + 


I ln2 y 




In 2 


K 


In 2 J 



2/, ln2. .. hid dlnd 

£/ (d,n,p) = —(n-r u )(n-r u —V)-\ — l — \-{d — r u -1) 
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£^(d,n,p) = -^{d + \n(d - r u - 1) + (d - r u - l)ln(J - r u - 1)} 

It can be proved that when d — > °° the following asymptotic expansions hold 

f\ 1 



e u (d,n,p) = e u (d,n,p) + o 
£[(d,n,p) = ef (d,n,p) + o 



d n 

(- - 
yd ' n y 



(5.13) 
(5.14) 



where: 



a , . \ pd n 
£,.(d,n,p) = — + — + n 

4(1- p) 2 

If, 1 Y\nd^ 2 

+ - 1 + 

2 v In 2 J 



2\nn + 



1 + — \lnn-W)- 
l ln2/ ; 



5-2p 
2(1- p) 



£f(d,n,p) 



v ln2y 
d 



l + (2n + 5) 



+ 



1 hid 



2 In 2 



n + 



l-p 



( d\ 
1-- 
l 2j 



V P ) 

Now from (2.11) and expressions (5.9) and (5.10) we have: 

d -£ u ( d < n >P) 
Hj(d)< £c ; M (J ,n)e M (J ,n) e 
n=[log 2 d]+l 



In 2 



+ 1(2—1)1=^ 
2 p 



// ; (J)> £c}-(d ,/i)e z (d ,n)c 

n=[log 2 



-E[(d,n,p) 



(5.15) 



(5.16) 



From the above inequalities and Eq. (2.15) upper and lower bounds for correlation function 
can be obtained. 



6 The Exponent of the Major Contributing Term. 

In this section we prove that certain set G(d,n) makes a major contribution to the function 
H j (d) . We also prove the uniqueness of such a set G(d,n) . 

If the symbols a , (5 are at distance d , then between them there are d - 1 other symbols. 

t T t 

Hence, in the next time step we will have p(d - 1) + (1 - p)(d - 1) = ( d - 1)(2 - p) 
symbols in average between the offspring of a and [5 . Taking into account the six possible 
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cases mentioned in Sec. 3, it is easy to prove that the expected values for the distance is 

t 

d =(d - 1)(2 - p) + 2 . Therefore, for a certain value of n there exists an element 
(il,...,i n ) e G(d,n) , such that, for every k = l,...,n- 1: i k+ i = [(2- p)(i k - 1) + 2] . 
Definition 6.1: Let JeN and < p < 1. Let n(d,p) be the positive integer such that the 
set G(d,n(d,p)) contains the element (i\,---,i n (d,p)) which holds the following condition: 

for every k = l,...,n(d,p) - 1 : z' fe+1 = [(2-p)(/ fe -l) + 2]. 
Proposition 6.2: For p — > and J — > °o we have: 

"ln(J(l-p) + p)' 



n(d,p) = 1 + 



ln(2-p) 



(6.1) 



Proof: Although z' yt + 1 = [(/^ - 1)(2 - p) + 2] , we have: 

ifc+i = (2 - p)/^ + p + £k , where: < £ k < 1 . It can be proved by induction that: 



i k =(2-p) n - l +p- 



in particular for k = n(d,p) 
d = 



(2 - /?)n " 1 - ll +£l (2-p)"- 2 + - +£ ,_ 1 



l-p 



{(0 — n\ n (diP)~^ _ 1 ] 
K P) ^_ + e x (2 - p) n(J ' 



p)-2 



+■ • ■+£ 



n(d,p)-\ 



From the above equation and the condition imposed to e ; - we have: 



n(d,p) < 1H < n(d,p) + 



H2-P) 



ln(2-p) 



ln2 + ln 



1 



1 



v 



_ n \«(d,/?)-i 



2(2 -p) 



From the last expression and for d — > °° , p — > we obtain (6.1). 
Remark: Let us note that: 

In J 



lim n(d,p) = 1 + 
lim n(d,p) = d 



In 2 



= l + [log 2 J] 



in agreement with the fact: 1 + [log 2 d] < n < d 
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7 Discussion. 



In Fig. 7.1 we show the graph of £^(d,n(d,p),p) for p = 0.1 (lower plot) and that 
obtained from averaging 10 simulations with the same values of p (upper plot). We want 
to remark the coincidence in shape of both plots. In [Chatzidimitriou-Dreismann & 
Larhmar, 1993] a magnitude inversely proportional to correlation function is studied. For 

that magnitude a fitness of the form F(d) = is obtained. In Fig. 1 of that paper a plot 

of the exponent q)(d) is shown. We also want to emphasise the coincidence in shape of that 
exponent and the lower plot in Fig. 7.1 of our present work. 




200 400 600 BOO 1000 

d 

Fig. 7.1: The graph of ( d, n( d, p), p) for p = 0.1 (lower plot) and experimental exponent 
obtained from averaging 10 simulations with the same value of p. Compare with Fig. 1 of 
[Chatzidimitriou-Dreismann & Larhmar, 1993]. 
In Fig. 1 of [Buldyrev et al., 1995] the graph of the averaged power spectra for all 33301 
coding and for all 29453 noncoding sequences of the GENBANK larger than 512 bp is 
shown. As the authors remark, there are three spectral regimes. In our opinion, the existence 
of several exponents is the best explanation for that behaviour. 
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8 Conclusion. 



We have studied the correlation function of an expansion-modification system which grasp 
the main features of the mutational process occurring in the evolution of DNA molecule. 
We obtain bounds for the exponent in the correlation function and show the resembling 
between the theoretical exponent and those obtained from simulation. We also give an 
explanation for the existence of several region in power spectra of real sequences. 
The authors are thankful to G. Martinez-Meckler, R. Bujalich, P. Miramontes for their 
helpful comments and Carlos Gonzalez chairman of Digintec Corp for his valuable sofware 
support. The authors also thank to an anonymuos referee for his useful comments. This 
work was partially (R. Mansilla) supported by DGAPA-UNAM, Mexico. 

Appendix A: The Proof of Inequality (2.5). 

Lemma A.l: Under the same conditions of Lemma 2.3, let S be the set: 



S = {(n,d) &N 2 :d>n> l,[d /2]>n,d even} 



then: 




6(d - l,n) + 0(d - L/i + 1) - e(d 1 2 - l,n) (n,d) e S 
0(d-l,n) + 9(d-l,n + l) (n,d) £ S 



Proof: 



We analyse several cases: 



I- If u(d,n) = d - 1 , then: 



d-2 

0(d,n + l)= ^9(k,n) + e(d-\,n) 

k = l(d,n) 



(A.1) 
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1.1- If l{d,n) = n; then from u(d,n) = d - 1 , it follow straightforward: u(d -l,n) = d - 2 
and because: l(d - l,n) = l(d,n) we have: 

d-2 

0(d-l,n + l) = J^0(£,n) (A.2) 

k=l(d,n) 

Now from (A.l) we have: 

0(d,n + 1) = 0(d - In + 1) + 0(d - l,n) (A.3) 

1.2- If /(d,n) = [J / 2] ; then it is not difficult to see that: 

[ d 1) / 2] ^ — 1 <i w even ^ 

[ [ J / 2] J is oJrf 

From the above it follow that if J is odd: l(d,n) = l(d - \,ri) and therefore we obtain (A.2) 
and (A.3). 

If d is even , we analyse two cases: 

1.2.1- If d 12 = n ; then dl2-\<n and: 

l(d - l,n) = n = d 1 2 = l(d,n) 
from the above equation we obtain (A.2) and (A.3). 

1.2.2- If d 1 2 - 1 > n ; then l(d - l,n) = d 1 2 - 1 therefore: l(d -\,ri) = l(d,ri)-\; from the 
last equation we have: 

d-2 

0(d-l,n + l)-0(d/2-l,n)= 

£ = /(c/,n) 

Now , from (A.l): 

9(d,n + 1) = 0(J - l,n) + 0(J - l,n + 1) - 9(d 1 2 - l,n) 
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II- Let suppose that u(d,n) = 2 n -1. Hence 2 n < d . Beside, from the Remark 3 that 
follows Lemma 2.3 we have: l(d,n) = [d 1 2] . We analyse two cases: 

II. 1- 2" = d ; then u(d,n) = d -I and we are in case I. 

H.2- 2 n <d; then: n < [log 2 (rf - 1)] and therefore: 9(d -l,n) = 0. 

Beside, because 2 n < d , l(d - \,ri) = [(d - 1) / 2] we have: 



2"-l 

9(d-\,n + \)= ^0(fc,n) = 

fe=[(J-l)/2] 



6(d - 1, n + 1) d is odd 

6(d - l,n + 1) - 0(d 1 2 - l,n) J w even 



We obtain the above condition from (A. 4). This completes the proof. 

From the above Lemma A.l we have: 6(d,n + 1) < 6(d -l,n) + 6(d - \,n + 1) . In order to 
obtain an upper bound for 0(d,n), we will study the following sequence defined 
recursively: 

0)(d,n) = 0)(d - \,n -1) + 0)(d -\,n) (A.5) 
with the following boundary conditions: 
C1-«(J,2) = W>4 
C2- o)(d,d-V) = d-2 W > 3 

It is not difficult to see that: 6(d,n) < 0)(d,n) . From now on we obtain some properties of 
Q)(d,n) . 

Let consider the vector [co(d, d - Y),0)(d,d - 2),...,0)(d,3)] t . From (A.5) and conditions CI 
and C2 we have: 
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(0(d,d-\) 




d-2' 


0)(d,3) 








+ 



B U B 12 
B 21 B 22 





(o(d-l,d-2) 
(0(d - 1,3) 



where: 



*ll=[0]Lcl 

B l2=[0 - 0] lx( ,_ 4) 

5 2l=[0 - 0][ x{d _ 4) 
1 1 ol 



5 22 - 







"-l(d-4)x(d-4) 
Lemma A.2: Let J > 5 . Then we have: 



0)(d,d-l) 
(aid, d-2) 

(0(d,3) 



d-5 



= b d-4 + Y, A d-4 A d-5- A d-k-4bd-k-5 ( A - 6 ) 

k = Q 



where: 



A d-r = 



B n B \2 

B 21 B 22 



B ll - [°](r-3)x(r-3) ; 5 21 - #12 " [° - %- 3)x(d - r) 
"1 1 0" 



B 22 ~ 







- (d-r)x(d-r) 
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b d _ r =[0 ..■ d-r + 2 ... 0][ x{d _ r) 

where the nonzero element of b d _ r is in position r - 3 . 
Proof: It will be by induction on d . 
For d = 5 we have: 



(0(5,4) 
(0(5,3) 



+ 




1 





(0(4,3) 



Let suppose that it is true for d and prove that it is also true for d + 1 



(0(d + l,d) 




~d-l 




"0 





• 


• 0" 





(0(d + \,d-\) 












1 


1 • 


• 


(0(d,d-\) 








+ 








1 • 


• 


(0(d,d-2) 


(0(d + 1,3) 















• 


• 1 


(0(d,3) 



From the hypothesis of induction, the last d - 3 components of the vector 
[0 (0(d,d - 1) • • • ft)(J,3)] f can be written, using (A. 6) as: 



0)(d + l,d) 




~d-l 


+ A d-3- 


d-5 
bd-4 + Y. A d 


(0(d + 1,3) 









k=Q 



d-5 

We have added to the element of b d _^ + ^ A d _^... A d _ k _^b ( i_] i _^ a row and/or a 

k=0 

column of zeros to obtain the same dimension. This completes the proof. 

Lemma A.3: Under the same conditions of Lemma A. 2 let , \<i, j < d - 3 , be the 

coefficients of matrix A d _^...A d _ r , then: 
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(r-A\ 
r-3 N 




2 < i < r — 2 an J j = r — 2 

r - 1 < j < d - 3 and j - r + 3 < z < j 
f'n ?/ze ctf/zer cases 



(A.7) 



Proof: It will be by induction on r . The property is obviously true for r = 4 . Let suppose 
that it is true for r and prove that it is also true for r + 1 . Let denote for the coefficient 

of matrix A d _^ r+ ^, then: 



1 1 ' = j '■> j = r — \,...,d — 3 
1 i = j-1 ; j = r-l,...,d -3 
z'n ?/ze ctf/zer cases 



Let Cy be the coefficients of matrix Aj_ 4 ... A^_ r Aj_( r+1 ) . Then , if j = r — l: 



d-3 

c i(r-l) = ^ a ikh(r-l) = a i{r-l)\r-V){r-l) 
k=\ 



r-3 
r-3-{r-\) + i 



r-3" 



for: 2 < i < r - 1 . In the other cases c ; ( r _i) = . Beside, if : r < j <d -3 then: 



d-3 



0/ = Y, a ik b kj = a i{j-Y) b U-\)j +a ij b 
k = \ 



JJ 



( r-3 N 


+ 


( 


r-3 ' 




' r-2 > 


v r-2- j + ij 






-3-7 + /, 




K r - 2 - 7 + 1, 



or j -r + 2<i < j .In the other cases c ; -,- = . This completes the proof . 
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Corollary A.4: Under the same assumptions that Lemma A. 3: 





A d-4--- A d-r b d-(r+l) 



(d-r + 1) 



(d-r + l) 



(d-r + 1) 







V J 



V 1 J 



Proof: The only nonzero element of vector ^_( r+1 ) is in position r-2 and has value 
d-r + l. From Lemma A. 3, the column r - 2 of matrix A d _^ ...A d _ r is: 







fr-A\ 



V J 



V L J 



r-f 
v r-4 y 



it 







This completes the proof. 
Proposition A.5: Let d > 5 . Then: 

d-5 

0)(d,d-i)= ^ (d-n-3) 

n=i-2 



f n ^ 



yi-2j 



for i > 2 



(A.8) 



Proof: From Lemma A. 2 and Corollary A.4 we have: 



~(0(d,d-l) 




d-2 


0)(d,d-2) 







0)(d,3) 








d-5 

+ Y,(d-k-3) 

k=0 
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From the above expression we obtain immediately the result. 



From (A. 8 ) it can be proved that for k > 3 : 



0)(d,k) = 



(2d -k-3) fd-4 
k-3 [d-k 



And from the above result and the definition of (o(d,k) we have: 



(2d -k-3) 
(k-3) 



(d-4\ 



0(d,k)< 



d-k 



(A.9) 



V 



J 
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